[rllib] Fix custom model-config mismatch between env-runner and learner#58739
Conversation
Signed-off-by: Mark Towers <mark@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a model configuration mismatch between the environment runner and the learner by correctly merging the algorithm's base model config with the RLModuleSpec's custom config. It also introduces a valuable error check in TorchRLModule.set_state to detect architecture mismatches when loading state into an inference_only module. My review includes a critical fix for the config merging logic to prevent a TypeError when the model config is a dataclass and to ensure compatibility with Python versions older than 3.9.
|
LGTM |
| raise ValueError( | ||
| "Architecture mismatch detected when loading state into inference_only module! " | ||
| f"Missing parameters (not found in source state): {list(missing_keys)} " | ||
| "This usually indicates the learner and env-runner have different architectures." |
There was a problem hiding this comment.
Let's please be a little more precise here.
-> What does having a different architecture mean here?
There was a problem hiding this comment.
Basically this Error should give a good clue to the user about what they are doing wrong.
There was a problem hiding this comment.
Good spot, it should probably reference the layer names being difference.
Signed-off-by: Mark Towers <mark@anyscale.com>
…er (ray-project#58739) ## Description The algorithm config isn't updating `rl_module_spec.model_config` when a custom one is specified which means that the learner and env-runner. As a result, the runner model wasn't been updated. The reason this problem wasn't detected previous was that when updating the model state-dict is we used `strict=False`. Therefore, I've added an error checker that the missing keys should always be empty and will detect when env-runner is missing components from the learner update model. ```python from ray.rllib.algorithms import PPOConfig from ray.rllib.core.rl_module import RLModuleSpec from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID config = ( PPOConfig() .environment('CartPole-v1') .env_runners( num_env_runners=0, num_envs_per_env_runner=1, ) .rl_module( rl_module_spec=RLModuleSpec( model_config={ "head_fcnet_hiddens": (32,), # This used to cause encoder.config.shared mismatch } ) ) ) algo = config.build_algo() learner_module = algo.learner_group._learner._module[DEFAULT_POLICY_ID] env_runner_modules = algo.env_runner_group.foreach_env_runner(lambda runner: runner.module) print(f'{learner_module.encoder.config.shared=}') print(f'{[mod.encoder.config.shared for mod in env_runner_modules]=}') algo.train() ``` ## Related issues Closes ray-project#58715 --------- Signed-off-by: Mark Towers <mark@anyscale.com> Co-authored-by: Mark Towers <mark@anyscale.com> Co-authored-by: Hassam Ullah Sheikh <hassamsheikh1@gmail.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
…er (ray-project#58739) ## Description The algorithm config isn't updating `rl_module_spec.model_config` when a custom one is specified which means that the learner and env-runner. As a result, the runner model wasn't been updated. The reason this problem wasn't detected previous was that when updating the model state-dict is we used `strict=False`. Therefore, I've added an error checker that the missing keys should always be empty and will detect when env-runner is missing components from the learner update model. ```python from ray.rllib.algorithms import PPOConfig from ray.rllib.core.rl_module import RLModuleSpec from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID config = ( PPOConfig() .environment('CartPole-v1') .env_runners( num_env_runners=0, num_envs_per_env_runner=1, ) .rl_module( rl_module_spec=RLModuleSpec( model_config={ "head_fcnet_hiddens": (32,), # This used to cause encoder.config.shared mismatch } ) ) ) algo = config.build_algo() learner_module = algo.learner_group._learner._module[DEFAULT_POLICY_ID] env_runner_modules = algo.env_runner_group.foreach_env_runner(lambda runner: runner.module) print(f'{learner_module.encoder.config.shared=}') print(f'{[mod.encoder.config.shared for mod in env_runner_modules]=}') algo.train() ``` ## Related issues Closes ray-project#58715 --------- Signed-off-by: Mark Towers <mark@anyscale.com> Co-authored-by: Mark Towers <mark@anyscale.com> Co-authored-by: Hassam Ullah Sheikh <hassamsheikh1@gmail.com>
Description
The algorithm config isn't updating
rl_module_spec.model_configwhen a custom one is specified which means that the learner and env-runner. As a result, the runner model wasn't been updated.The reason this problem wasn't detected previous was that when updating the model state-dict is we used
strict=False.Therefore, I've added an error checker that the missing keys should always be empty and will detect when env-runner is missing components from the learner update model.
Related issues
Closes #58715